Back

Trends in Hearing

SAGE Publications

Preprints posted in the last 90 days, ranked by how well they match Trends in Hearing's content profile, based on 12 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.

1
Testing differential effects of periodicity and predictability in auditory rhythmic cueing of concurrent speech

MacLean, J.; Zhou, M.; Bidelman, G.

2026-03-13 neuroscience 10.64898/2026.03.11.711109 medRxiv
Top 0.1%
8.1%
Show abstract

Entrainment and predictive coding aid speech perception in both quiet and noisy environments. Isochronous, periodic auditory rhythmic cues facilitate entrainment and temporal expectations which can benefit encoding and perception of target speech. However, most studies using isochronous cues confound periodicity with predictability. To this end, we characterized how systematic changes in the acoustic dimensions of stimulus rate, target phase, periodicity, and predictably of an entraining sound precursor impact the subsequent identification of concurrent speech targets. Target concurrent vowel pairs were preceded by rhythmic woodblock cues which were either periodic-predictable (PP, isochronous rhythm), aperiodic-predictable (AP, accelerating rhythm), or aperiodic-unpredictable (AU, random rhythm). The number of pulses per rhythm was roved to further manipulate predictability. Stimuli also varied in presentation rate (2.5, 4.5, 6.5 Hz) and target speech phase (in-phase, 0{degrees}; out-of-phase, 90{degrees}, 180{degrees}) relative to the preceding entraining rhythm. We also measured participants musical pulse continuation and standardized speech-in-noise perception abilities. We did not observe any effects of stimulus rhythm, rate, or target phase on target speech identification accuracy. However, reaction times were slowest at the nominal speech rate (4.5 Hz) and were most disrupted by out-of-phase presentations following the PP rhythm. Double-vowel task performance was associated with stronger musical pulse continuation abilities, but not speech-in-noise perception. Our results support the notion that entraining rhythmic cues rely on top-down processing but are relatively muted when stimulus predictability is unknown. Additionally, we find that individual differences in musical pulse perception may underlie the benefits of rhythmic cueing on subsequent speech perception.

2
Discrimination of spectrally sparse complex-tone triads in cochlear implant listeners

Augsten, M.-L.; Lindenbeck, M. J.; Laback, B.

2026-03-24 neuroscience 10.64898/2026.03.20.712905 medRxiv
Top 0.1%
6.8%
Show abstract

Cochlear implant (CI) users typically experience difficulties perceiving musical harmony due to a restricted spectro-temporal resolution at the electrode-nerve interface, resulting in limited pitch perception. We investigated how stimulus parameters affect discrimination of complex-tone triads (three-voice chords), aiming to identify conditions that maximize perceptual sensitivity. Six post-lingually deafened CI listeners completed a same/different task with harmonic complex tones, while spectral complexity, voice(s) containing a pitch change, and temporal synchrony (simultaneous vs. sequential triad presentation) were manipulated. CI listeners discriminated harmonically relevant one-semitone pitch changes within triads when spectral complexity was reduced to three or five components per voice, with significantly better performance for three-component compared to nine-component tones. Sensitivity was observed for pitch changes in the high voice or in both high and low voices, but not for changes in only the low voice. Single-voice sensitivity predicted simultaneous-triad sensitivity when controlling for spectral complexity and voice with pitch change. Contrary to expectations, sequential triad presentation did not improve discrimination. An analysis of processor pulse patterns suggests that difference-frequency cues encoded in the temporal envelope rather than place-of-excitation cues underlie perceptual triad sensitivity. These findings support reducing spectral complexity to enhance chord discrimination for CI users based on temporal cues.

3
Can Multimodal Large Language Models Visually Interpret Auditory Brainstem Responses?

Jedrzejczak, W.; Kochanek, K.; Skarzynski, H.

2026-04-17 otolaryngology 10.64898/2026.04.15.26350944 medRxiv
Top 0.1%
6.4%
Show abstract

Introduction: Auditory brainstem response (ABR) is a standard objective method for estimating hearing threshold, especially in patients who cannot reliably participate in behavioral audiometry. However, ABR interpretation is usually performed by an expert. This study evaluated whether two general-purpose artificial intelligence (AI) multimodal large language model (LLM) chatbots, ChatGPT and Qwen, can accurately estimate ABR hearing thresholds from ABR waveform images. The accuracy was measured by comparisons with the judgements of 3 expert audiologists. Methods: A total of 500 images each containing several ABR waveforms recorded at different stimulus intensities were analyzed. Three expert audiologists established the reference auditory thresholds based on visual identification of wave V at the lowest stimulus intensity, with the most frequent judgment among the three used as the reference. Each waveform image was independently submitted to ChatGPT (version 5.1) and Qwen (version 3Max) using the same standardized prompt and without additional clinical context. Agreement with the expert thresholds was assessed as mean errors and correlations. Sensitivity and specificity for detecting hearing loss (>20 dB nHL) were also calculated. In cases where the AI and expert thresholds nominally matched, corresponding latency measures were also compared. Results: Auditory thresholds derived from both LLMs correlated strongly with expert opinion, with Pearson r = 0.954 for ChatGPT and r = 0.958 for Qwen. ChatGPT showed a mean error of +5.5 dB and Qwen showed a mean error of -2.7 dB. Exact nominal agreement with expert values was achieved in 34.6% of ChatGPT estimates and 35.6% of Qwen estimates; agreement within +/-10 dB was observed in 75.6% and 80.0% of cases, respectively. For hearing-loss classification, ChatGPT achieved 100% sensitivity but low specificity (20.4%), whereas Qwen showed a more balanced profile with 91.6% sensitivity and 67.5% specificity. Curiously, estimates of wave V latency were markedly poor for both LLMs, with systematic underestimation and weak correlations with the expert judgements. Conclusion: ChatGPT and Qwen demonstrated a moderate ability to estimate ABR thresholds from waveform images, although their performance was not good enough for independent clinical use. Both models captured general patterns of hearing loss severity, but there was systematic bias, limited specificity and sensitivity balance, and poor latency estimation. General-purpose multimodal LLMs may have potential as assistive or preliminary tools, but clinically reliable ABR interpretation will likely require specialized, domain-trained AI systems with expert oversight.

4
Modeling the Influence of Bandwidth and Envelope on Categorical Loudness Scaling

Neely, S. T.; Harris, S. E.; Hajicek, J. J.; Petersen, E. A.; Shen, Y.

2026-04-01 neuroscience 10.64898/2026.03.30.715393 medRxiv
Top 0.1%
4.8%
Show abstract

In a loudness-matching paradigm, a reduction in the loudness of sounds with bandwidths less than one-half octave compared to a tone of equal sound pressure level has been observed previously for five-tone complexes at 60 dB SPL centered at 1 kHz. Here, this loudness-reduction phenomenon is explored using band-limited noise across wide ranges of frequency and level. Additionally, these measurements are simulated by a model of loudness judgement based on neural ensemble averaging (NEA), which serves as a proxy for central auditory signal processing. Multi-frequency equal-loudness contours (ELC) were measured for each of the adult participants (N=100) with pure-tone average (PTA) thresholds that ranged from normal to moderate hearing loss using a categorical-loudness-scaling (CLS) paradigm. Presentation level and center frequency of the test stimuli were determined on each trial according to a Bayesian adaptive algorithm, which enabled multi-frequency ELC estimation within about five minutes of testing. Three separate test conditions differed by stimulus type: (1) pure-tone, (2) quarter-octave noise and (3) octave noise. For comparison, loudness judgements for all three stimulus types were also simulated by the NEA model, which comprised a nonlinear, active, time-domain cochlear model with an appended stage of neural spike generation. Mid-bandwidth loudness reduction was observed to be greatest at moderate stimulus levels and frequencies near 1 kHz. This feature was approximated by the NEA model, which suggests involvement of an early stage of the central auditory system in the formation of loudness judgements.

5
Impacts of heminode disruption on auditory processing of noisy sound stimuli

Tripathy, S.; Budak, M.; Maddox, R.; Mehta, A. H.; Roberts, M. T.; Corfas, G.; Booth, V.; Zochowski, M.

2026-02-04 neuroscience 10.64898/2026.02.02.703242 medRxiv
Top 0.1%
4.3%
Show abstract

Hidden hearing loss (HHL) is an auditory neuropathy characterized by altered auditory nerve responses despite normal hearing thresholds. Recent experimental and computational studies suggest that permanent disruptions to heminode positions in spiral ganglion neuron (SGN) fibers can contribute to these deficits. However, the interaction between heminode disruption and noisy backgrounds ubiquitous in daily listening remains unexplored. This study investigates how background noise affects auditory processing with these peripheral disorders and how deficits propagate to downstream sound localization circuits in the superior olivary complex. We developed computational models of SGN fibers with mild and severe degrees of heminode disruption, subjected to sinusoidal tone stimuli in the presence of background noise with varying spectral characteristics. We analyzed the phase-locking of SGN fiber responses to the stimulus tone and modeled the subsequent effects on interaural time difference (ITD) sensitivity in the medial superior olive (MSO) using a binaural localization network. We found that near-tone-frequency noise disrupted SGN phase locking through cycle-to-cycle variability in spike phases, with effects consistent across tone frequencies. Mild heminode disruption produced frequency-dependent degradation in SGN phase locking, with effects observed only at higher frequencies tested (600-1000 Hz), without reducing overall firing rates. Critically, the effects of noise and heminode disruption were additive, with combined exposure leading to reduced ITD sensitivity and large temporal fluctuations in MSO responses. Severe heminode disruption, which additionally reduced firing rates at the SGN fibers and subsequent stages, produced profound localization deficits across all frequencies tested. Thus, our model results suggest that noisy environments exacerbate auditory deficits from peripheral disorders implicated in HHL and could potentially impair speech intelligibility through degradation in localization ability. This model may be useful for understanding the downstream impacts of SGN neuropathies.

6
Peripheral phoneme encoding and discrimination in aging and hearing impairment

Wouters, M.; Gaudrain, E.; Dapper, K.; Schirmer, J.; Baskent, D.; Ruettiger, L.; Knipper, M.; Verhulst, S.

2026-01-28 neuroscience 10.64898/2026.01.27.702044 medRxiv
Top 0.1%
4.0%
Show abstract

Speech perception difficulties in noise are common among older adults and individuals with hearing impairment, even when audiometric thresholds appear normal. We examined how aging, cochlear synaptopathy (CS), and outer hair cell (OHC) damage affect speech encoding and phoneme discrimination. Envelope-following responses (EFRs) to rectangular amplitude-modulated (RAM) tones and speech-like phoneme pairs were recorded in quiet using EEG, and behavioral discrimination was assessed in quiet, ipsilateral, and contralateral noise. Stimuli were designed to target temporal envelope (TENV) or temporal fine structure (TFS) encoding. Results showed that RAM-EFR amplitudes decreased gradually with age, consistent with emerging CS, while magnitudes of high-frequency TENV-based EFRs in quiet were most reduced in older hearing-impaired listeners with combined CS and OHC damage. In contrast, EFRs targeting low-frequency TENV encoding in quiet remained preserved. Behaviorally, phoneme discrimination of TFS contrasts worsened with OHC loss and age in quiet and contralateral noise, respectively, while there was no significant effect of age on the discrimination of TENV contrasts. Considering that high-frequency contrasts are discriminated via place-based spectral cues, low-frequency contrasts rely on TFS, and the EFR reflects primarily TENV, this framework explains why EFRs decline for high-frequency cues without perceptual loss, while EFRs remain stable for low-frequency cues even as TFS-based discrimination deteriorates. These findings highlight the need for further investigation into how neural coding deficits relate to perceptual outcomes. Combining electro-physiological and behavioral measures might provide a sensitive framework for detecting subclinical auditory deficits to earlier diagnose age-related and hidden hearing loss. HighlightsO_LISpeech-evoked EEG shows OHC loss-related decline of high-CF enve- lope encoding. C_LIO_LISpeech-evoked EEG shows low-CF envelope encoding stays intact with age. C_LIO_LIFine-structure contrast discrimination worsens with OHC loss in quiet. C_LIO_LIFine-structure contrast discrimination worsens with age in contralateral noise. C_LIO_LIHigh-frequency place-based spectral cues discrimination remains robust with age. C_LIO_LIPeripheral coding strength is not directly reflected at behavioral level. C_LI

7
Improving Automated Diagnosis of Middle and Inner Ear Pathologies by Estimating Middle Ear Input Impedance from Wideband Tympanometry

Kamau, A. F.; Merchant, G. R.; Nakajima, H. H.; Neely, S. T.

2026-03-31 otolaryngology 10.64898/2026.03.26.26349034 medRxiv
Top 0.1%
3.6%
Show abstract

Conductive hearing loss (CHL) with a normal otoscopic exam can be difficult to diagnose because routine clinical measures such as audiometric air-bone gaps (ABGs) can identify a conductive component but often cannot distinguish among specific underlying mechanical pathologies (e.g., stapes fixation versus superior canal dehiscence, which may produce similar audiograms). Wideband tympanometry (WBT) is a fast, noninvasive test that can provide additional mechanical information across a broad range of frequencies (200 Hz to 8 kHz). However, WBT metrics are influenced by variations in ear canal geometry and probe placement and can be challenging to interpret clinically. In this study, we extend prior WBT absorbance-based classification work by estimating the middle ear input impedance at the tympanic membrane (ZME), a WBT-derived metric intended to reduce ear canal effects. To estimate ZME, we fit an analog circuit model of the ear canal, middle ear, and inner ear to raw WBT data collected at tympanometric peak pressure (TPP). Data from 27 normal ears, 32 ears with superior canal dehiscence, and 38 ears with stapes fixation were analyzed. A multinomial logistic regression classifier was trained using principal component analysis (retaining 90% variance) and stratified 5-fold cross-validation with regularization. We compared feature sets based on ABGs alone, ABGs combined with absorbance, and ABGs combined with the magnitude of ZME. The combination of ABGs and the magnitude of ZME produced the best performance, achieving an overall accuracy of 85.6% compared to 80.4% for ABGs alone and 78.4% for ABGs combined with absorbance. These results suggest that incorporating model-derived middle ear impedance features with standard audiometric measures (ABGs) can improve automated pathology classification for stapes fixation and superior canal dehiscence.

8
From sound to source: Human and model recognition of environmental sounds

Alavilli, S.; McDermott, J. H.

2026-03-14 neuroscience 10.64898/2026.03.12.711349 medRxiv
Top 0.1%
3.2%
Show abstract

Our ability to recognize sound sources in the world is critical to daily life, but is not well documented or understood in computational terms. We developed a large-scale behavioral benchmark of human environmental sound recognition, built stimulus-computable models of sound recognition, and used the benchmark to compare models to humans. The behavioral benchmark measured how sound recognition varied across source categories, audio distortions, and concurrent sound sources, all of which influenced recognition performance in humans. Artificial neural network models trained to recognize sounds in multi-source scenes reached near-human accuracy and qualitatively matched human patterns of performance in many conditions. By contrast, traditional models of the cochlea and auditory cortex that were trained to recognize sounds produced worse matches to human performance. Models trained on larger datasets exhibited stronger alignment with both human behavior and brain responses. The results suggest that many aspects of human sound recognition emerge in systems optimized for the problem of real-world recognition. The benchmark results set the stage for future explorations of auditory scene perception involving salience and attention.

9
Neural Correlates of Listening States, Cognitive Load, and Selective Attention in an Ecological Multi-Talker Scenario

Shahsavari Baboukani, P.; Ordonez, R.; Gravesen, C.; Ostergaard, J.; Rank, M. L.; Alickovic, E.; Cabrera, A. F.

2026-03-15 neuroscience 10.64898/2026.03.13.711289 medRxiv
Top 0.1%
2.7%
Show abstract

This study assessed neural responses to continuous speech to classify listening state, cognitive load, and selective auditory attention in complex acoustic environments. EEG was recorded while participants listened to concurrent male and female talkers under two conditions: active listening, where attention was directed to one of two competing speakers (target vs. masker), or passive listening, where attention was diverted to a visual task. Cognitive load was varied by manipulating target-to-masker (TMR) ratio (TMR: +7 dB, -7 dB), with lower TMR representing more demanding listening conditions. Spectral EEG features across frequency bands were ranked with univariate statistics and used to classify listening state (active vs passive) and cognitive load (low vs. high TMR). Auditory attention decoding (AAD) was performed using linear stimulus reconstruction to identify the target talker during active listening. Classification of listening state achieved 90.3% accuracy, and AAD reached 84.4% accuracy, demonstrating robust tracking of attentional engagement. In contrast, classification of cognitive load was near chance, suggesting that more extreme acoustic manipulations may be required to elicit distinct neural signatures. Comparable performance using a reduced set of electrodes near the ear indicates the potential for integration with wearable hearing devices. Overall, these results demonstrate that EEG can distinguish attentional states and selectively track target speech in realistic auditory scenarios. The findings provide a foundation for future applications in monitoring listening behavior, supporting auditory processing, and improving brain-controlled hearing aids in complex acoustic environments. HighlightsO_LIListening state (active vs. passive) can be classified from EEG spectral features. C_LIO_LIAttended speech can be decoded by reconstructing speech envelopes from EEG. C_LIO_LIComparable accuracy is achieved using only electrodes placed around the ears. C_LIO_LIEEG can monitor listening state and track auditory attention in two-speaker settings. C_LI Graphical AbstractEEG signals were recorded while participants listened to two concurrent speech streams, either by actively attending to one speaker or by focusing on an unrelated visual task. Spectral features of the EEG were used to classify listening state (active vs. passive) and cognitive load (low vs. high TMR). Auditory attention decoding (AAD) was performed by reconstructing the speech envelope from the EEG time signal. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=80 SRC="FIGDIR/small/711289v1_ufig1.gif" ALT="Figure 1"> View larger version (32K): org.highwire.dtl.DTLVardef@1079628org.highwire.dtl.DTLVardef@1135404org.highwire.dtl.DTLVardef@1f0d950org.highwire.dtl.DTLVardef@14b4c9a_HPS_FORMAT_FIGEXP M_FIG C_FIG Classification of listening state (active vs. passive): 90.3% accuracy. EEG difference between active and passive listening. Left, power spectrum, right, topographic map (alpha band 8-12 Hz). Classification of cognitive load (low vs high TMR): near chance level. EEG difference between low and high TMR. Left, power spectrum, right, topographic map (alpha band 8-12 Hz). O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=80 SRC="FIGDIR/small/711289v1_ufig2.gif" ALT="Figure 2"> View larger version (34K): org.highwire.dtl.DTLVardef@9229b1org.highwire.dtl.DTLVardef@1ef394corg.highwire.dtl.DTLVardef@9adecforg.highwire.dtl.DTLVardef@199f8c2_HPS_FORMAT_FIGEXP M_FIG C_FIG AAD achieved 84.4% accuracy, indicating robust decoding of the attended speaker during active listening, while performance dropped to near chance during passive listening.

10
Speech-in-Noise Difficulties in Aminoglycoside Ototoxicity Reflects Combined Afferent and Efferent Dysfunction

Motlagh Zadeh, L.; Izhiman, D.; Blankenship, C. M.; Moore, D. R.; Martin, D. K.; Garinis, A.; Feeney, P.; Hunter, L. R.

2026-03-26 otolaryngology 10.64898/2026.03.23.26348719 medRxiv
Top 0.1%
2.7%
Show abstract

Objectives: Patients with Cystic fibrosis (CF) often receive aminoglycosides (AGs) to manage recurrent pulmonary infections, placing them at risk for ototoxicity. Chronic AG use can lead to complex cochlear damage affecting inner and outer hair cells, the stria vascularis, and spiral ganglion neurons. The greatest damage is typically in the basal cochlear region, which encodes high-frequency hearing, with additional involvement of more apical regions. While extended-high-frequency (EHF) hearing loss (EHFHL; 9-16 kHz) is often the earliest sign of AG ototoxicity, speech in noise (SiN) effects are rarely studied. Our overall hypothesis is that SiN perception difficulties in individuals with CF, treated with AGs, are related to combined cochlear and neural damage, primarily in the EHF range but also in the standard frequency (SF; 0.25-8 kHz) range. Three mechanisms that contribute to SiN perception were evaluated in children and young adults: 1) a primary effect of reduced EHF sensitivity, measured by pure-tone audiometry (PTA) and transient-evoked otoacoustic emissions (TEOAEs); 2) a secondary effect of subclinical damage in the SF range, measured by PTA and TEOAEs; and 3) additional neural effects, measured by middle ear muscle reflex (MEMR) threshold (afferent) and growth functions (efferent).Design:A total of 185 participants were enrolled; 101 individuals with CF treated with intravenous AGs and 84 age and sex-matched Controls without hearing concerns or CF. Assessments included EHF and SF PTA; the Bamford-Kowal-Bench (BKB)-SIN test for SiN perception; double-evoked TEOAEs with chirp stimuli from 0.71 to 14.7 kHz; and ipsilateral and contralateral wideband MEMR thresholds and growth functions using broadband stimuli. Results: Reduced sensitivity at EHFs (PTA, TEOAEs) was not associated with impaired SiN perception in the CF group. SF hearing, regardless of EHF status, was the primary predictor of SiN performance in the CF group. Increased MEMR growth was also significantly associated with poorer SiN in the CF group. Conclusions: In CF, impaired SiN perception was primarily predicted by SF hearing impairment, with additional involvement of the efferent auditory pathway through increased MEMR growth. These results build on prior evidence for efferent neural effects due to ototoxic exposures, supporting both sensory (afferent) and neural (efferent) mechanisms that contribute to listening difficulties in CF. Thus, preventive and intervention strategies should consider these combined mechanisms in people with AG ototoxicity to address their SiN problems.

11
Chronic acoustic degradation via cochlear implants alters predictive processing of audiovisual speech

Gastaldon, S.; Gheller, F.; Bonfiglio, N.; Brotto, D.; Bottari, D.; Trevisi, P.; Martini, A.; Vespignani, F.; Peressotti, F.

2026-01-27 neuroscience 10.64898/2026.01.25.701504 medRxiv
Top 0.1%
2.6%
Show abstract

This study provides the first neurophysiological evidence of how cochlear implant (CI) input affects predictive processing during audiovisual language comprehension in deaf individuals. Using EEG, we compared 18 CI users with 18 normal-hearing (NH) controls during sentence comprehension where final word predictability was determined by high or low semantic constraint (HC vs. LC) of the preceding sentence frame. Between sentence frame and final word, a 800 ms silent gap was introduced. Mouth visibility was manipulated during sentence frames (visible or digitally occluded; V+ vs. V-), while the final words were always presented with the mouth visible. In NH participants, lower-beta power (12-15 Hz) in left frontal and central sensors decreased for HC vs. LC contexts during the pre-target silent gap, but only when the mouths was visible, suggesting active prediction generation. In CI users, this lower beta power decrease was absent. After final word presentation, both groups showed N400 predictability effects, indicating preserved prediction evaluation. However, CI users exhibited extended N400 effects in the V+ condition, suggesting additional processing demands. Across all participants, pre-target beta modulations correlated with language production abilities, supporting prediction-by-production frameworks. Within CI users, poorer audiometric thresholds correlated with larger N400 constraint effects, possibly indicating greater reliance on contextual prediction to compensate for degraded sensory input. These findings demonstrate that CI-mediated perception alters the neural mechanisms of prediction generation. The link between production skills and predictive mechanisms suggests that strengthening expressive language abilities may enhance predictive processing in CI users.

12
Acoustic Salience Drives Pupillary Dynamics in an Interrupted, Reverberant Task

Figarola, V.; Liang, W.; Luthra, S.; Parker, E.; Winn, M.; Brown, C.; Shinn-Cunningham, B. G.

2026-04-02 neuroscience 10.64898/2026.03.31.715639 medRxiv
Top 0.1%
1.9%
Show abstract

Listeners face many challenges when trying to maintain attention to a target source in everyday settings; for instance, reverberation distorts acoustic cues and interruptions capture attention. However, little is known about how these challenges affect the ability to maintain selective attention. Here, we measured syllable recall accuracy and pupil dilation during a spatial selective attention task that was sometimes disrupted. Participants heard two competing, temporally interleaved syllable streams presented in pseudo-anechoic or reverberant environments. On randomly selected trials, a sudden interruption occurred mid-sequence. Compared to anechoic trials, reverberant performance was worse overall, and the interrupter disrupted performance. In uninterrupted trials, reverberation reduced peak pupil dilation both when it was consistent across all stimuli in a block and when it was randomized trial to trial, suggesting temporal smearing reduced clarity of the scene and the salience of events in the ongoing streams. Pupil dilations in response to interruptions indicated perceptual salience was strong across reverberant and anechoic conditions. Specifically, baseline pupil size before trials did not vary across room conditions, and mixing or blocking of trials (altering stimulus expectations) had no impact on pupillary responses. Together, these findings highlight that stimulus salience drives cognitive load more strongly than does task performance.

13
The cortical contribution to the speech-FFR is not modulated by visual information

Riegel, J.; Schüller, A.; Wissmann, A.; Zeiler, S.; Kolossa, D.; Reichenbach, T.

2026-01-27 neuroscience 10.64898/2026.01.26.701703 medRxiv
Top 0.1%
1.8%
Show abstract

Seeing a speakers face can significantly aid understanding, particularly in challenging acoustic environments. An early neural response implicated in audiovisual speech processing is the frequency-following response (speech-FFR), which occurs at the fundamental frequency of the speech signal. This response arises from both subcortical areas and the auditory cortex. Previous studies have shown that subcortical responses are reduced when bimodal stimulation includes visual input from the talkers face. Here, we examined the cortical contribution to the speech-FFR and its potential modulation by visual information. We recorded MEG responses to four types of audiovisual signals: a still image, an artificially generated avatar, a degraded video, and a natural video. The audio stimuli were presented in a substantial level of background noise to make behavioral audiovisual effects stand out. Speech-in-noise comprehension increased significantly from the audio-only condition to the avatar and the degraded video, and further to the natural video. Moreover, we found that all types of audiovisual stimuli yielded robust speech-FFRs in the auditory cortex at an early latency of around 30 ms. However, the magnitude of this neural response was neither enhanced nor attenuated by the videos, nor could the cortical contribution of the speech-FFR explain a significant portion of the variance in the behavioral comprehension scores. Our results suggest that visual modulation of the speech-FFR in the auditory cortex is, if existent, too small to be measurable in scenarios where speech occurs in considerable background noise.

14
Sound lateralization Ability is affected by saccade direction but not Eye Movement-Related Eardrum Oscillations (EMREOs)

Sotero Silva, N.; Bröhl, F.; Kayser, C.

2026-02-05 neuroscience 10.1101/2025.11.05.686724 medRxiv
Top 0.1%
1.7%
Show abstract

Eye-movement-related eardrum oscillations (EMREOs) are pressure changes recorded in the ear that supposedly reflect displacements of the tympanic membrane induced by saccadic eye movements. Previous studies hypothesized that the underlying mechanisms might play a role in combining visual and acoustic spatial information. Yet, whether and how the eardrum moves during an EMREO and whether this movement affects acoustic spatial perception remains unclear. We here probed human acoustic lateralization performance for sounds presented at different times during a saccade (hence the EMREO) in two tasks, one relying on free-field sounds and one presenting sounds in-ear. Since the EMREO generation likely involves the middle ear muscles, whose tension can alter sound transmission, it is possible that judgements of sound locations may vary with the state of the ERMEO at the time of sound presentation. However, when testing two specific hypotheses of how movements of the eardrum underlying the EMREO may affect spatial hearing, we found no evidence in support of this. Still, and in line with previous studies, we found that participants lateralization responses were shaped by the spatial congruency of the saccade target direction and the sound direction. Thus, either the eardrum does not move directly as reflected by the EMREO signal, or despite its movement the underlying changes at the tympanic membrane only have minimal perceptual impact. Our results call for more refined studies to understand how the eardrum moves during a saccade and whether or how the EMREO impacts spatial perception.

15
Trial-By-Trial Auditory Brainstem Response Detection

Liu, G. S.; Ali, N.-E.-S.; O Maoileidigh, D.

2026-02-03 physiology 10.64898/2026.01.31.703019 medRxiv
Top 0.1%
1.7%
Show abstract

The neural response of the brainstem to brief sounds, known as the auditory brainstem response (ABR), is widely employed in the laboratory and the clinic to diagnose hearing loss. In contrast to behavioral methods that assess hearing using responses to sounds on a trial-by-trial basis, current ABR approaches are limited to analyzing the average ABR over hundreds of trials. Historically, trial-by-trial ABR analysis has not been possible owing to each trials small signal-to-noise ratio. Here we overcome this limitation and show how to classify individual ABR trials as detected or undetected. We use the distribution of single-trial ABRs to assess supra-threshold hearing and to define psychophysics-like thresholds, which we call auditory brainstem detection (ABD) thresholds. ABD thresholds decrease as more of the ABR epoch is taken into account, whereas traditional ABR thresholds do not change. Above the ABD thresholds and below 90 dB SPL, signal detection is significantly improved by utilizing more of the ABR epoch. Our method also allows us to rank the supra-threshold hearing ability of individual subjects. Despite having normal ABR thresholds, some subjects appear to have supra-threshold hearing deficits. The trial-by-trial method demonstrates that signal detection by the ensemble of auditory neurons in the brainstem is intrinsically stochastic not only at low stimulus levels, but also at levels up to 100 dB SPL. Significance StatementNeural responses to sound can be measured by electrodes placed on a subjects head and are commonly used in the laboratory and the clinic to assess hearing. Although the auditory system must distinguish each sound stimulus from intrinsic noise, current methods for ana-lyzing the response of the brainstem to sound only utilize the average response to hundreds of stimuli. Here we overcome this constraint by showing how to classify an individual sound stimulus as detected or undetected based on each auditory brainstem response. This ap-proach can assess hearing at all stimulus levels, indicates that subjects with normal hearing thresholds can exhibit supra-threshold hearing loss, and potentially extends the types of hearing deficits that can be diagnosed using auditory evoked potentials.

16
First Real-World Evidence Utilizing the Multidimensional Tinnitus Functional Index to Assess Treatment Impact with Bimodal Neuromodulation

Sabine, M. O.; Fligor, B. J.

2026-01-22 otolaryngology 10.64898/2026.01.20.26344445 medRxiv
Top 0.1%
1.4%
Show abstract

PurposeReal-world evidence (RWE) is of practical significance as it enables the evaluation of whether findings observed in rigorously controlled clinical trial settings are generalizable to routine clinical practice. While Lenire, a bimodal neuromodulation tinnitus treatment device, has demonstrated efficacy and safety within controlled trials, further RWE from clinics is needed to reinforce these results. This is the first real-world study to assess the therapeutic effects of Lenire on tinnitus using the Tinnitus Functional Index (TFI), a multidimensional instrument designed to capture tinnitus severity and treatment responsiveness. The study correlates findings with the Tinnitus Handicap Inventory (THI), a well-established tool that assesses the perceived functional, emotional, and catastrophic impact of tinnitus that was used in previous clinical trials and real-world studies. The use of an alternative validated outcome measure in a real-world study may add more feasible, relevant and patient-centered research findings to the body of evidence for Lenire, while maintaining scientific credibility. MethodsA single-site, single-arm retrospective study examining patients fitted with the Lenire device was conducted. Ninety-seven patients with moderate or greater tinnitus severity used the Lenire device for 12 weeks, for up to 60 minutes a day. The primary outcome was change in tinnitus severity, assessed using the TFI at 6-week (FU1) and 12-week (FU2) follow-ups. The THI was included as a secondary outcome measure. Responder rates and mean score changes between initial assessment and FU1 and FU2 were compared using Z-tests for proportions and t-tests, respectively. Pearsons correlations were used to examine the relationship between the TFI and THI change scores. ResultsAfter just 12 weeks of treatment, 73.4% [95% CI: 62.6%, 84.3%] of patients achieved a clinically significant improvement, defined as a reduction of at least 13 points on the TFI. This improvement was strongly supported by results from the THI, where 84.1% [95% CI: 75.1%, 93.2%] of patients met the minimum clinically important difference of 7 points. Mean score reductions were-25.9 (2.4, SEM) for the TFI and - 28.0 (2.4, SEM) for the THI. Change scores from initial assessment to FU2 on the TFI and THI were highly correlated (r = 0.74, p < 0.001), indicating strong agreement between the two measures in capturing treatment related improvements. All eight TFI subdomains showed reductions ranging from 18.5 to 31.4 points at FU2. ConclusionsThis retrospective study demonstrates that 12 weeks of treatment with the Lenire device resulted in clinically meaningful improvements in tinnitus severity on the TFI which was strongly supported by the THI. The high correlation between TFI and THI change scores indicates strong correlation between the two questionnaires in capturing treatment effects. Furthermore, all eight TFI subdomains showed notable reductions, underscoring the multidimensional impact of the treatment. These findings support the clinical utility of both the TFI and THI as complementary tools for evaluating treatment outcomes and guiding tinnitus management in routine practice.

17
Multivariate Prediction of Conductive Dysfunction in Well and NICU Newborns using Wideband Acoustic Immittance with Acoustic Reflex Tests

Hunter, L. L.; Feeney, M. P.; Fitzpatrick, D.; Keefe, D. H.

2026-03-15 otolaryngology 10.64898/2026.03.13.26348314 medRxiv
Top 0.1%
1.4%
Show abstract

ObjectivesThe overall goal of this study was to assess tympanometric and ambient wideband acoustic immittance (WAI) tests and wideband acoustic reflex thresholds (ART) in well-baby and newborn intensive care (NICU) cohorts with three specific objectives: 1) Assess predictive accuracy for WBT and ART for conductive dysfunction in ears referring on the first or second stages of newborn hearing screening; 2) Identify inadequate tests likely due to probe blockages or leaks; and 3) Assess prediction models separately for well-baby and NICU screening outcomes. DesignProspective, observational study of full-term (n=514) and premature newborns (n=239) recruited from well-baby and NICU nursery birth hospital newborn hearing screening program. Wideband tympanometry, ambient absorbance, and acoustic reflexes were tested after Stage 1 transient otoacoustic emissions (TEOAE) screening. The reference standard for Pass or Refer groups was initially defined on the stage 1 TEOAE test result. Pass or Refer groups were then reassigned based on the stage 2 screening ABR for those who referred at Stage 1, and all NICU infants. Multivariate models were developed using reflectance and admittance variables to predict conductive dysfunction relative to the screening reference standard in a randomized sub-group of subjects at Stage 1 and Stage 2 screening. Classification accuracy was evaluated on a second, independent sub-group. Individual tests were classified as having inadequate probe fits if they had excessively low values of sound pressure level or susceptance (leak) or absorbance (blockage). ResultsDifferences in ambient absorbance for Pass v. Refer screening groups revealed the greatest differences and effect sizes occurring in frequency bins between 1.4-2 kHz. Screening failure at both Stage 1 and 2 was most accurately predicted by models using ambient absorbance and power level variables at frequencies between 1-2.8 kHz, including ARTs. Tympanometric admittance variables at the positive-pressure tail for frequencies between 1-2.8 kHz in combination with the ART were more accurate predictors than those at peak pressure or the negative-pressure tail. Multivariate models generalized well to an independent group of infants at both Stage 1 and 2 for both the ambient and tympanometric models. Ambient tests revealed more inadequate tests than tympanometric tests, primarily due to blocked probe tips. Exclusion of ears to detect probe leaks or blockages slightly improved the ambient prediction models, but did not affect tympanometric models. ConclusionWideband acoustic reflex tests improved all models for ambient and tympanometric absorbance. Multivariate prediction models developed for WAI tests were repeatable in an independent group of well and NICU infants, suggesting that the results are generalizable to these populations. Detection of probe blockage or leaks slightly improved prediction for ambient measures. Pressurized tests have the advantage of ensuring probe seals due to the need for a hermetic seal, thus are useful to ensure adequate probe insertion.

18
A meta-analysis of bone conduction 80 Hz auditory steady state response thresholds for adults and infants with normal hearing

Perugia, E.; Georga, C.

2026-02-14 otolaryngology 10.64898/2026.02.12.26346168 medRxiv
Top 0.1%
1.3%
Show abstract

BackgroundAuditory steady-state responses (ASSRs) provide an objective method for estimating hearing thresholds in individuals unable to provide behavioural responses. Bone conduction (BC) testing is required to differentiate conductive from sensorineural hearing loss. Accurate BC ASSR threshold estimation relies on "correction" factors, which are not yet well established. This meta-analysis evaluated the reliability of BC ASSR thresholds to estimate hearing thresholds at 500, 1000, 2000 and 4000 Hz. MethodsA systematic search of PubMed, the Cochrane Library, and Embase was conducted to identify studies involving normal-hearing (NH) and hearing-impaired (HI) participants of all ages. Outcomes were (1) the difference between ASSR behavioural and ASSR thresholds, and (2) ASSR thresholds. The risk of bias was evaluated using the Newcastle-Ottawa Scale. The mean and 95% confidence intervals (CI) were calculated for the thresholds at the four frequencies. The certainty of the evidence was assessed using GRADE approach. ResultsOf records identified, 11 records met the inclusion criteria, yielding a total of 27 studies. Sample sizes ranged from 60 to 249 participants across frequencies and age groups. The quality of records ranged from low to high. Data were synthesised using random-effects models due to heterogeneity. In NH adults, the mean differences ({+/-}95% CI) between BC ASSR thresholds and behavioural thresholds were 17.0 ({+/-}4.8), 15.5 ({+/-}6.0), 13.4 ({+/-}3.3), and 12.1 ({+/-}4.1) dB at 500, 1000, 2000, and 4000 Hz, respectively. In NH infants, mean ({+/-}95% CI) BC ASSR thresholds were 17.2 ({+/-}2.2), 10.5 ({+/-}3.6), 26.4 ({+/-}2.7), and 19.9 ({+/-}4.0) dB HL at the same frequencies. The certainty of the evidence was very low. ConclusionsBC ASSR can be a reliable method for estimating BC thresholds. However, age and frequency significantly impact BC ASSR thresholds, highlighting the need to develop of "correction" factors to accurately predict BC behavioural thresholds. RegistrationPROSPERO CRD42023422150.

19
The richness of little voices: using artificial intelligence to understand early language development

Petrache, M.; Carvallo, A.; Silva, V.; Barcelo, P.; Pena, M.

2026-01-31 neuroscience 10.64898/2026.01.30.702650 medRxiv
Top 0.1%
1.3%
Show abstract

How informative are preschoolers speech vocalizations? Preschoolers speech is often imprecise, highly variable and hard to interpret by humans and machines; consequently, its predictive value for later developmental outcomes remains quite underexplored. Here, we analyzed 6.595 brief vocalizations (0.5-5s) from 127 preschoolers aged 3-4 years, including 74 children with diagnosed language delay, recorded in naturalistic environments. The vocalization models robustly distinguished children with and without language delay (ROC-AUC 0.90), beyond the acoustic properties of the recordings (ROC-AUC: 0.62), and outperformed similar models analyzing metadata that literature reports as predictive factor for early language development (ROC-AUC: < 0.69 [95% CI: 0.08 - 0.15 to 0.48 - 0.73], P < 0.001]). This indicates that neural networks applied to foundational model audio vectorizations can extract meaningful developmental markers from brief samples of immature speech, to classify speech status, offering a promising, scalable approach for language abilities early screening.

20
EEG correlates of auditory rise time processing: A systematic review

Manasevich, V.; Kostanian, D.; Rogachev, A.; Sysoeva, O.

2026-03-09 neuroscience 10.64898/2026.03.06.710012 medRxiv
Top 0.1%
1.0%
Show abstract

Rise time (RT) is considered to be one of the most significant acoustical characteristics of auditory speech stimuli. A substantial amount of data has been accumulated on the neurophysiological mechanisms of RT processing under different conditions and in different groups of people, but these data have not been systematised. This review focuses on studies that have investigated electroencephalographic (EEG) markers of RT sensitivity. The present literature search was conducted according to the PRISMA statement in PubMed, Web of Science and APA PsychInfo databases. The resultant review comprised 37 studies that considered diverse aspects of RT processing. The review describes the main stimulation parameters affecting electrophysiological markers of RT processing reflected in different components of event-related potentials, brainstem responses and cortical rhythmic activity. The main finding of this review is that the rise time prolongation leads to a decrease in the amplitude of the main ERP components and an increase in their latencies. However, the sensitivity of the EEG markers varied with the earliest components tracking the subtle difference (few tens of microseconds), while the later components coding the larger one (up to 500 ms). Nevertheless, the observed effects may vary and depend on some aspects of the experimental paradigm, age of participants and speech-related problems. Future research may benefit by addressing understudied clinical groups and ERP components such as P1 and N2, dominated in children.